COCO: an annotated Twitter dataset of COVID-19 conspiracy theories

The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narratives that are frequently found in the dataset. Furthermore, we illustrate the interconnection between the conspiracy categories as well as the keywords.


Introduction
The COVID-19 pandemic severely affected the entire world, and consequently it has dominated world news and social media throughout years 2020 and 2021. Along with this media attention, an abundance of misinformation has swept through social media [1]. The pandemic has demonstrated the crucial role that misinformation plays when societies are faced with unfamiliar circumstances, and how highly implausible claims can have a dramatic real-world impact.
While initially there was a great deal of genuine uncertainty about the origin of the virus, its effects, and the vaccines, with different experts supporting different assertions, a large number of ideas that are scientifically impossible or highly implausible were promoted by non-experts on social media and other channels. Many of these ideas took the form of conspiracy theories which provided easy explanations for the complex medical and societal events occurring during the COVID-19 pandemic, usually in the form that events happen due to the hidden influence of some prominent individual or group [2].
We use the common term conspiracy theories for narratives that consist of disproved or unproven accusations against any individual or any group perceived as powerful to give an explanation for impactful economic, cultural, social, political, or other events by utilizing claims of clandestine malevolent schemes [3,4]. While believes in paranormal powers, supernatural entities, or pseudoscience may also be a part of these narrations, we focused on clandestine malevolent schemes to cluster related conspiracy theories into categories. The spreading of conspiracy theories increased substantially during the COVID-19 pandemic [5,6] and they were among the most prominent misinformation phenomena during that time. For that reason, our dataset focuses on conspiracy theories. The more narrow focus allows a more precise characterization of the contents that was being spread.
To a large extent, misinformation such as conspiracy theories is ultimately inconsequential, but some of it has the potential to cause real-world harm and due to the massive amount of social media contents, it is essentially impossible to find all harmful misinformation manually. Thus, conventional fact-checking can typically only counteract misinformation narratives after they have gained significant traction. To provide warnings in advance, automated systems are needed. However, the automatic detection of misinformation narratives is very challenging. The texts that spread misinformation may be short messages on Twitter and they often transmit misinformation by relying on context and implication rather than by stating counterfactual information outright, and satirical messages complicate the issue further.
To train automated systems, several different misinformation datasets have been released in the past. However, most have only true/false annotations, such as the ISOT dataset [7], or annotations on a scale from true to blatantly false, which is the case for the LIAR dataset [8]. Training on these datasets will not enable a machine learning model to distinguish between different misinformation narratives. This distinction is important because during the COVID-19 pandemic, many different misinformation narratives were promoted on social networks, some of them being related and some contradicting each other. To train machine learning classifiers to distinguish between narratives, we created a quality-controlled, human-annotated dataset of 3495 annotated tweets which we present in this paper. We created a total of 12 categories of conspiracy theories and labeled each tweet as belonging to one of three classes in each category for a total of 41,940 labels.
Furthermore, we give a detailed qualitative and quantitative description of the contents of the dataset and the resulting conclusions on misinformation during the COVID-19 pandemic. Due to the complexity of the multi-category annotation, understanding the contents can be helpful for further evaluation of the results of natural language processing (NLP) systems. While the primary purpose of the dataset is to train NLP models capable of detecting stances and distinguishing topics, it can also be used for further quantitative and qualitative investigation of misinformation narratives.

Dataset creation
The dataset was created in a multi-stage process, starting with the raw dataset which was created by collecting a large number of tweets related to the COVID-19 pandemic from Twitter between January 17, 2020 and Jun 30, 2021. We used the Twitter search API via our custom distributed Twitter scrapping framework called FACT [9] and targeted COVID-19 related keywords. The list of keywords is given in the Appendix. Note that this data collection is a long-running project. Therefore, the collection was not specifically geared towards the dataset described in this paper. The raw dataset contains approximately two billion statuses (i.e. tweets, retweets, quotes, and replies). We first removed retweets, quotes, and replies, leaving over 400 million tweets.

Tweet selection
Since conspiracy tweets are not particularly frequent, random sampling of the data would result in a very low number of such tweets as the number of tweets that can be labeled manually is limited. To avoid that, we use a list of keywords related to conspiracy theories and perform a text search. During the COVID-19 pandemic we observed misinformation trends and developed the list. Some keywords were chosen based on previous knowledge of conspiracy topics [10,11], while others were added because they were widely discussed, and a few were discovered in other misinformation tweets and subsequently added to the list [12].
This second list of keywords is also given in the Appendix. By applying it as a filter, we narrowed the selection to slightly more than one million tweets. We then removed tweets that contain hyperlinks. This was done because the tweet set was used during the MediaEval multimedia evaluation challenge in 2021 [13], where using the links could distract from the goal of the challenge, i.e. natural language understanding via machine learning systems.
Hyperlinks can be very valuable for understanding the intent of a tweet, and this technique has frequently been used in previous work [14,15]. However, our goal is to work towards the understanding of language rather than links. Furthermore we feel that focusing on tweets containing links may introduce bias in the analysis since links generally represent information that the users saw elsewhere, while tweet texts represent information that the users formulated themselves, even though their ideas may have been influenced from other sources. Investigating such text allows a much clearer view at the evolution of narratives over time. About half of the selected tweets contained no links.
For the remaining tweets, we attempted to resolve the self-reported location of the tweet authors. Location can be highly useful, especially since many tweets refer to the politics of the country of the author. We make use of a system to resolve locations from previous work [12]. The system is described in the Appendix. We then removed the tweets for which the location could not be resolved. This again cuts the number of tweets approximately by half. Among those, we select tweets that have a high number of characters since inferring narratives from very short tweets is impossible. This leaves a set of about 100,000 tweets. Finally, we randomly selected 3495 tweets and performed the manual labeling. The selection was done in a way that ensures that a constant proportion of the tweets was selected from every day in the dataset, to ensure an even distribution and to account for the fact that the daily number of COVID-19 related tweets was much higher in Spring 2020 than during the later stages of the pandemic. Table 1 shows the exact numbers for each step.

Manual labeling
We created 12 labels, one for each category of conspiracy theory. The categories are describe in the section "Categories". The labeling was performed by a diverse group of staff scientists, postdocs, and graduate students in computer science, media studies, and psychology. Since many tweets constitute corner cases and are difficult to label, we ensured reliability of the dataset by having three separate annotators for each tweet. Annotators were issued an initial description of the categories, which is contained in the appendix. The annotators also met regularly to discuss their understanding of the categories.
Each label is the result of a majority vote among the three annotators. In case of a triple disagreement, which can happen since there are three annotators and three classes, the project leader broke the tie. Thus, the dataset was created using 36 human annotations per tweet for a total of 125,820, with the final dataset having 41,940 consolidated annotations.
The inter-annotator agreement was 92.27% on average, varying between 98.11% and 85.61% for all categories except for the catchall category Other conspiracy theory where it was 75.85%. Because there are twelve categories, disagreement on at least one of them was quite frequent, occurring in 55.68% of all tweets. Inter-annotator agreement for each category is listed in Table 2. We used a custom web-based annotation tool to make the labeling as efficient as possible. The tool also handled multiple annotations and voting automatically. No additional information, e.g. other tweets by the same user, was taken into account during the labeling. The reason for this is to ensure the usability of the dataset to train NLP systems based on the available text and labels alone.

Classes
We created ten different categories of conspiracy topics, which are described in the section "Categories". In addition, we defined two unspecified categories to label other conspiracy theories and other misinformation. For each of the 12 categories, a tweet is labeled as belonging to one of the following three classes. Thus, every tweet has 12 separate labels which can be one of the following: 1. Unrelated The tweet is not related to that particular category. Such tweets contain conspiracy related keywords, but use them in a completely different context. 2. Related (but not supporting) The tweet is related to that particular category, but does not actually promote the misinformation or conspiracy theory. Typically the authors of such tweets point out that other believe in the misinformation. 3. Conspiracy (related and supporting) The tweet is related to that particular category, and it is spreading the conspiracy theory. This requires that the author gives the impression of at least partially believing the presented ideas. This can be expressed as a statement of fact, but also in other ways such as by using suggestive questions. It includes statements which present the misinformation as uncertain but possible or likely for statements of fact that are impossible or highly unlikely, such as microchips contained in vaccines.
Since our focus lies on detecting intentions contained in the wording, we do not consider the Related (but not supporting) category to be spreading misinformation. Of course, based on the mere exposure effect [16,17], which implies that even talking about misinformation can make it more likely for people to believe in it, a different definition is possible. In this case, the task to detect spreaders of misinformation would be far easier for natural language processing systems, since intention in this classification would not be relevant. However, to identify e.g. spreaders of disinformation, intention is important and thus it is essential to distinguish between the Related and Conspiracy classes.
While each tweet has a label in each category, in the following, we also classify entire tweets as this allows better descriptive statistics of the dataset. We consider a tweet to be a conspiracy tweet if at least one of the categories was labeled as conspiracy for it. Tweets that have no conspiracy label are considered related if at least one of the categories was labeled as related. Thus, a tweet is classified as unrelated only if it was labeled as unrelated for all twelve categories.

Categories
Since the beginning of the COVID-19 pandemic we maintained a list of circulating conspiracy theories that we regularly expanded and cross checked with those from publications by other researchers [10,11]. We then created the following categories of conspiracy theories. They combine COVID-19 specific conspiracy theories as well as older general conspiracy ideas.
As shown in previous work, existing misinformation was sometimes reinterpreted and connected to COVID-19 [12]. Therefore, we expected to find similar phenomena in this data as well. For example, New World Order has been a topic among conspiracy theorists for a long time [18], but now it is being discussed in context of COVID-19. Based on an understanding of the misinformation topics that were frequently discussed during the pandemic, we created the following broad categories: 1. Suppressed cures This category collects narratives which propose that effective medications for COVID-19 were available, but whose existence or effectiveness has been denied by authorities, either for financial gain by the vaccine producers or some other harmful intent, including ideas from other conspiracy categories listed below. It thus refers to the treatment of COVID-19, irrespective of its origin. 2. Behavior control In this category we collected narratives containing the idea that the pandemic is being exploited to control the behavior of individuals, either directly through fear, through laws which are only accepted because of fear, or through techniques which are impossible with today's technology, such as mind control through microchips.

Anti vaccination
We collect all statements that suggest that the COVID-19 vaccines serve some hidden nefarious purpose in this category. Examples include the injection of tracking devices, nanites or an intentional infection with COVID-19. This category does not include concerns about vaccine safety or efficacy, or concerns about the trustworthiness of the producers, since these are not conspiracies, even though they may contain misinformation. Furthermore, we do not consider forced vaccination a conspiracy narrative since many western countries introduced vaccine mandates for some professions and Germany and Austria, despite earlier denials [19], introduced an unsuccessful bill in early 2022 that would have made the vaccination of all citizens above the age of 18 mandatory [20,21]. 4. Fake virus Prominent narratives that surfaced early in the pandemic were that there is no COVID-19 pandemic or that the pandemic is just an over-dramatization of the annual flu season. Typically, the claimed intent is to deceive the population in order to hide deaths from other causes, or to control the behavior of the population through irrational fear. 5. Intentional pandemic This straightforward narrative posits that the cause of the pandemic is purposeful human action pursuing some illicit goal. It thus produces a culprit for the situation. Note that this is distinct from asserting that COVID-19 is a bioweapon or discussing whether it was created in a laboratory [22] since this does not prelude the possibility that it was released accidentally, which would not produce a culprit and thus not qualify as a conspiracy theory. 6. Harmful radiation This class of conspiracy theories bundles all ideas that connect COVID-19 to wireless transmissions, especially from 5 G equipment. This was done by claiming for example that 5 G is deadly and that COVID-19 is a coverup, or that 5 G allows mind control via microchips injected in the bloodstream. As 5 G misinformation has already been studied separately [12,23,24], it was not the focus of this dataset but it is included nonetheless since it is related to other conspiracy theories. 7. Depopulation Conspiracy theories on population reduction or population growth control suggest that either COVID-19 or the vaccines are being used to reduce population size, either by killing people or by rendering them infertile. In some cases, this is directed against specific ethnic groups. These narratives often use the term "population control" in the sense of population size control which needs to be distinguished from population behavior control covered in other conspiracy theories. 8. New world order New World Order (NWO) is a preexisting conspiracy theory which deals with the secret emerging totalitarian world government [25]. In the context of the pandemic, this usually means that COVID-19 is being used to bring about this world government through fear of the virus or by taking away civil liberties, or some other, implausible ideas such as mind control. 9. Esoteric misinformation Previous work on 5 G-related COVID-19 misinformation [12,23,24] showed that truly esoteric ideas concerning spiritual planes played a significant role in the initial weeks of the pandemic. The category was included to determine whether such connections also exist for other conspiracy narratives. Since the ideas behind these statements are often unclear, we do not strictly require them to be conspiracy theories. Note that conventional faithbased statements such as "praying for the pandemic to end" do not fall into this category. 10. Satanism This category collects narratives in which the perpetrators are alleged to be some kind of satanists, perform objectionable rituals, or make use of occult ideas or symbols. Such conspiracy narratives may involve harm or sexual abuse of children, such as the idea that global elites harvest adrenochrome from children to extend their own lives (Adrenochrome is a byproduct of the oxidation of adrenaline and has no such properties). Many of these ideas predate COVID-19, but they have been reinterpreted in the new context of the pandemic. While the concrete allegations differ, they have in common that they connect the alleged perpetrators to the representation of evil, and thus paint a picture of them as someone to be opposed at all cost. 11. Other conspiracy theory We added a catchall category for tweets that interpret other known conspiracy theories in the light of COVID-19 or connect some of the above categories to preexisting conspiracy theories, for example claiming the existence of a deep state which is the perpetrator of an Intentional pandemic or some other sinister plot. 12. Other misinformation A final catchall category for tweets containing substantial misinformation that does not fulfill the requirements of a conspiracy theory. Only misinformation that does not belong to any such conspiracy theory is labeled here separately, such as incorrect statements about COVID-19 that are not connected to any perpetrator or purpose. Note that this constitutes a flagging of rather obvious misinformation rather than a fine-grained fact checking of every single statement, which would be beyond the scope of this paper.

Quantitative dataset description
In this section we give a quantitative overview over the dataset. We start with the number of times each label was assigned, which is given in Table 2. Overall refers to the classification of each entire tweet, as described in the section "Classes". Thus, it is the number of tweets that were assigned at least one conspiracy class label, at least one related class label but no conspiracy label, or only unrelated labels, respectively, and not the sum of the previous entries in each column. Naturally, most tweets are unrelated to most categories, but since a tweet is considered a conspiracy tweet if it promotes misinformation in any category, 1797 out of 3495, i.e. 51%, belong to this class.

Connections between keywords and categories
We give an overview over the connections between pairs of keywords, pairs of categories, and keyword-category pairs in several tables in the appendix. Table 4 shows the number of times keywords are mentioned in the same tweet. Since this is based on text search alone and requires no manual annotation, we extend the search to the Contain no link set mentioned in Table 1. We restricted the table to the 36 keywords with a meaningful number of occurrences and co-occurrences. We observe that especially the QAnon-related keywords have a substantial number of co-mentions. Next, we show a similar statistic for the categories in Table 5. It illustrates which classes frequently occur together, such as Anti vaccination and Behavior control or Intentional pandemic, New world order, and Depopulation. In the section "Goal narratives" we discuss how these combined categories create specific conspiracy ideas. Table 6 contains a combination of the above two statistics, showing how often conspiracy tweets from each category contain the different keywords. Some of these connections are obvious since the keywords are identical or almost identical to the category name, but others are more unexpected. For example, the word plandemic is used in both the Fake virus and Intentional pandemic category, but it has a different meaning in there. The numbers can also be used to gauge how much the use of the keywords is correlated with tweets carrying misinformation.

Location analysis
As stated in the section "Tweet selection", all tweets contain a self-reported location which we transform to a country/state pair that can be evaluated automatically. The technique is based on querying the Google geocoding API. It was used and explained in previous work [12]. The left side of Fig. 1 shows the global results. As the keywords we used are predominantly based on misinformation narratives from the US, e.g. QAnon, it is expected that more than two thirds of the tweets come from there. Furthermore, since the keywords are in English, only English-speaking countries appear frequently among the locations. While the US has the most tweets per inhabitant (7.2 per million), Canada is not far behind with 5.8, followed by UK and Ireland with 4.8, and Australia and New Zealand with 3.3. India, Nigeria, and South Africa have a far lower rate. This is also expected since these countries have lower Internet and Twitter usage, and they frequently use languages other than English. All other countries have even lower rates, and thus hardly affect the overall picture.
Due to the strong US focus, we also analyze locations at the state level, as shown on the right side of Fig. 1. About 14% of the US users did not specify a state. For the rest, the number of tweets follows quite closely the population size of the state, with the two notable exceptions being Arizona and the District of Columbia, which has about 2% of the US tweets despite its small population (about 0.2% of the US total).
In Fig. 2 we provide the same statistics for the conspiracy tweets only. We observe that among these, the US is even more dominant (72 vs. 68%) while India, Nigeria, and South Africa are less represented. This is to be expected since the conspiracy narratives are focused on the US.
Among the larger US states, only Florida shows a meaningful difference compared to the overall numbers (7 vs. 6%), and South Carolina and Missouri make it to the top 12 instead of Illinois and Colorado, with South Carolina having the highest rate of conspiracy tweets (28 out of 34). Considering that most of the narratives are pro Trump/Republican and anti-Democrat, it is to be expected that Republicanleaning states have a higher rate of conspiracy tweets, but the data does not show a consistent effect here. More noticeable is the fact that while the percentages for the larger states are almost the same in Figs. 1 and 2, among the conspiracy tweet authors far fewer specify smaller states (27 vs 34%) and far more only specify the US (20 vs 14%). The total number of users covered here is 3094, which means that most users only wrote one tweet in the dataset.

Distribution over time
Finally, we show the distribution of the categories over time. Figure 3 shows the fraction for the conspiracy tweets on a monthly basis. Table 7 in the Appendix gives the corresponding absolute numbers.

Fig. 2 Distribution of conspiracy tweets by country and US state
We observe that Anti vaccination is by far the most prominent topic, and it remains relatively consistent in size. New world order is also quite large and consistent. Both topics have had a sizeable presence before the COVID-19 pandemic. Therefore, it is not surprising that they are quite prominent in early 2020, before the pandemic fully arrived in the US. Other topics such as Depopulation or Intentional pandemic gain popularity during the pandemic. However, Depopulation seems to shrink in 2021.
We perform the same analysis for the related tweets. Figure 4 shows again the fraction on a monthly basis while Table 8 in the Appendix gives the corresponding numbers.
The picture is quite different here, with Intentional pandemic, Harmful radiation, and Depopulation being much more present than other categories. Clearly, there is a difference between the topics discussed by proponents of conspiracy theories and other Twitter users. For topics that vary widely over time, one might expect a time lag where tweets that are related to categories continue to appear long after the topic lost interest among conspiracy circles. Harmful radiation Fig. 3 The fraction of conspiracy tweets per category over time. From Table 7 Fig. 4 The fraction of conspiracy related tweets per category over time. From Table 8 would be a good example since the topic quickly lost popularity among western Twitter users [12] in the second half of 2020.
However, since we did not include 5 G as a keyword, this hypothesis cannot be confirmed from our dataset. Also, note that the individual numbers by month and category are very small and do not constitute a basis for statistically robust analysis.

Qualitative analysis of the narratives
The spread of many conspiracy ideas differs from a typical information cascade because the information mutates along the way. Thus, each of the categories in the section "Categories" which we use for quantitative study has a large number of variations to the exact narrative. It is not feasible to study these quantitatively, but they can be investigated qualitatively. Thus, the objective of this section is to provide more detail on the exact narratives that are frequently found in the tweets of this dataset.
An important feature of many categories of conspiracy narratives is that they can contain mutually exclusive narratives without seemingly weakening the impact of the category as a whole. This was observed for 9/11 conspiracy theories [3], as well as for 5 G-COVID misinformation [12].
Conspiracy theories need a perpetrator, means, and a goal, although sometimes one of the components can be rather nondescript. Among the COVID-19 conspiracies, means were quite prominent, and the first six categories deal primarily with means. On the other hand, Depopulation and New World Order are goals, and typically some aspect of COVID-19 or the vaccines are the corresponding means. Satanism ostensibly identifies a perpetrator, i.e. satanists, although this carries little weight since anyone could secretly be a satanist. The category focuses at least as much on means, i.e. rituals involving harm or abuse of children. Of the remaining three categories, Other conspiracy theory collects previously known conspiracies which are a mix of means, e.g. chemtrails, perpetrators, e.g. deep state, and goal, e.g. great replacement. See Moffit et al. for a more detailed discussion on the structure of conspiracy theories [5]. The Esoteric misinformation category is unclear, i.e. it does not present identifiable common narratives, and Other misinformation does not follow this structure at all. Thus, for training machine learning models, we recommend to exclude the last there categories since they do not provide identifiable narratives.

Political goals
Among the most common suspected agendas is the idea that the pandemic serves to prevent Donald Trump from being reelected. This is typically paired with Fake virus narrative, claiming that a nonexistent pandemic is used to create a state of emergency, and sometimes it is also paired with the Intentional pandemic narrative. Here the claim is that China in collusion with the Deep State or individuals such as George Soros, released the virus to create the state of emergency. However, the opposite idea, i.e. that the state of emergency is actually a means to keep Donald Trump in power, was also present although it disappeared later in the year 2020. Many political tweets make reference to QAnon and related terms, and some also to the Trump campaign slogans MAGA and KAG (Make America Great Again/Keep America Great).
A less concrete agenda appears in the New world order category. Here, the state of emergency and the control of the behavior of the population is intended to bring about the New World Order, which is sometimes described as socialist. Infrequently, it is referred to as one world agenda or great reset, as introduced by the World Economic Forum [26]. In cases where the alleged perpetrator is the Chinese leadership, the alleged goal is often to hurt the US or the western world.
As US users are the plurality of the authors of the investigated tweets, ideas concerning politics in other countries were less common. Thus, it is more difficult to establish recurring narratives. For example, the search term population control appears frequently in India, but there it refers to the population control bill [27] rather than a Depopulation conspiracy theory.

Depopulation goals
The primary goal besides the political ones is depopulation, for which we created the Depopulation category. The narrative is sometimes straightforward: the perpetrators created an Intentional pandemic with the goal of reducing the world population. A similar narrative relies on the Fake virus and Anti vaccination idea, claiming that COVID-19 is either harmless or non-existing, but the public concern about it serves to pressure people to accept a vaccine which is deadly. The second version was more common, often with Bill Gates as the alleged perpetrator.
In developing countries, the idea of population growth control via infertility caused by a vaccine has been relatively common [28], especially with the goal of reducing population growth of specific ethnic groups. However, in developed countries population growth has all but stopped, making it much less of a public concern. Since the dataset is US/UK focused, infertility narratives were rare.

Financial goals
Some conspiracy theories claim financial motivations of the perpetrators, although they are far less frequent than political goals. They do appear regularly in Suppressed cures narratives, and sometimes as a motivation for an Intentional pandemic with the aim of earning money on vaccines with the alleged perpetrator usually being Bill Gates.

Imaginary goals
In addition to the above, some conspiracy theories suggest goals which are scientifically impossible. The most prominent are Mind Control narratives which we sorted in the Behavior control. Another impossible goal is contained in the Adenochrome narrative, which claims that the substance is harvested from children to prolong the life of older members of the elite. We classified this narrative under Satanism as it resembles ritual murder narratives and the authors sometimes refer to it as satanic.
There is considerable speculation about the motivations for belief in fictitious conspiracy theories. One common interpretation is that such ideas are meant figuratively [29]. For example, from this viewpoint the Adenochrome narrative could signify that older people benefit from anti-COVID measures such as lockdowns in the form of reduced risk of death from COVID-19, while the younger people predominantly pay the price in the form of lost school education or work income. However, understanding the motivations of the authors of such tweets is beyond the scope of this paper.
In the context of 5 G-COVID, a substantial amount of Esoteric misinformation was found in previous work [12] which was suggesting imaginary goals. While some of the keywords we used here cover similar topics (especially the mind control narrative), such posts were exceedingly rare in this dataset.

Fearmongering
For the political goals described in the section "Goal narratives", the most common alleged means was the idea that the perpetrators create unfounded fear in the population to attain their goals, using narratives from the Fake virus category. Typically, they claim that there is no (SARS-COV-2) virus, and that the perpetrators use fear of COVID-19 to make the population act according to their designs. Less often, conspiracy theories claim that fear mongering happens via an Intentional pandemic. Here, the authors do not doubt the COVID-19 fatalities, but claim hat they are a side effect and that e.g. widespread lockdowns as a result of fearing COVID-19 is the intended effect.
A common but weaker version of the Fake virus narrative was false reporting of COVID-19 numbers. The authors of such tweets do not deny the existence of COVID-19, but claim that the number of victims is far lower than the official numbers suggest, either via direct manipulation by the government, or by an alleged financial incentives for hospitals to misreport deaths as COVID-19 related. This is often combined with the claim that the remaining cases are caused by a seasonal flu rather than a pandemic.
Some related tweets containing counterstatements claimed that supporters of Donald Trump changed their message from denying the existence of COVID-19 as an independent pandemic to claiming that numbers were manipulated, which seems to be the case in our dataset.

Vaccines
Vaccines are the primary means for depopulation, financial, and many imaginary goals. Similar to 5 G, which had substantial opposition prior to COVID-19 [12], opposition to vaccines has been quite substantial before the pandemic [30]. Sometimes, they claim that Fearmongering or an Intentional Pandemic is used as a means to persuade people to accept the vaccine. In that case people taking the vaccine becomes a goal rather than a means.

Suppressed cures
Tweets discussing Suppressed cures conspiracy theories were quite infrequent. We found two recurring categories: The first deals with Hydroxychloroquine (HCQ), which was initially considered an effective treatment for COVID-19 in some countries [31] and popularized by Donald Trump [32]. Several countries that did use it ceased to do so after clinical trials showed high risks and low effectiveness [33]. The conspiracy theories claim that HCQ was abandoned either because it is not profitable for the pharmaceutical companies or to encourage people to accept dangerous vaccines instead. Thus, such narratives posit usually financial and sometimes depopulation goals. More rarely, they suggest imaginary mind control goals or support for fearmongering. The idea here is that by removing effective medications, people are more likely to be afraid of COVID-19. This narrative however is relatively rare.
In addition to HCQ, suppressed cures narratives for colloidal silver, which is an alternative medicine product, are being used to promote it as a "secret" miracle cure. Such narratives posit the same goals as other Suppressed cures tweets. However, the motivation of the tweet authors is likely to promote ineffective medications that they themselves are selling.

G, magnetism, microchips, and tagging
Imaginary goals are generally accompanied by imaginary means, i.e. means which have no scientific basis for functioning. One of the most common narratives in the dataset is the idea that COVID-19 vaccines contain microchips (and sometimes nanochips or Smartdust [34]). These chips either allow the perpetrator to control the mind of the recipients or allow tracking them via radio frequency identification (RFID). This idea is sometimes connected to the ID2020 digital identity provider.
Note that imaginary means are different from Suppressed cures, since it is conceivable that existing medications are effective against COVID-19, but imaginary means have no conceivable way of working in reality.
The tracking idea is common enough to have spawned tweets containing counter statements. Typically they observe that the ubiquitous smartphones already track their user's location, which makes tracking via implanted chips obsolete.
For the outlandish idea that COVID-19 vaccines render the user magnetic, we only found counter-statements. It is likely that this idea commanded much more attention among users of mainstream media than among proponents of conspiracy theories.

Intentional pandemic
The main narrative in this class claims that COVID-19 is a bioweapon developed in Wuhan that was released intentionally, either to reach a political goal (usually with George Soros, Anthony Fauci, the Deep State, or the Chinese leadership as the perpetrator), or with the aim of depopulation (usually by Bill Gates). There is a substantial variety in the exact story, but due to its concrete focus on Wuhan, bioweapon, and recognizable perpetrators, this represents one of the most consistent narratives found in the dataset.
A somewhat weaker form of the Intentional pandemic narrative asserts a failure to act, either on the part of the Chinese leadership for not warning the world adequately of the developing pandemic or by Donald Trump w.r.t. to the US pandemic response. What makes these statements relevant in the context of conspiracy theories is that they assert malice on the part of the acting entity. Some of these tweets contain extreme statements, such as "Trump murdered 150,000 people". The mark is associated with proof of vaccination, which in some countries was required to enter stores during lockdowns in 2021 [35]. Some conspiracy tweets refer to the mark as implanted chips rather than the proof of vaccination systems that were actually used. In either case, putting COVID measures into an eschatological context and calling them a tool of the devil provides a narrative that justifies opposition to the measure. The mark is always presented as a means for exerting control over the population.

Deep State
The Deep State turned out to be one of the most frequent perpetrators. Many tweets that contain QAnon or related keywords mention it, usually with political goals using Fearmongering as a means. While the Deep State is generally not explained further within the tweets, it is often linked to, or even used as a proxy, for the Democratic party in the US.

George Soros and globalists
George Soros is a frequent target of right-wing conspiracy theories [36]. In our dataset, he was mostly mentioned as the perpetrators of political goal conspiracies, such as plots to prevent the reelection of Donald Trump, or the establishment of a New world order. He is usually mentioned along with the Deep State. The term globalists is often used in conjunction with Soros, or sometimes as perpetrators of similar conspiracy narratives.

Anthony Fauci and Bill Gates
Both names appear frequently as perpetrators of the alleged conspiracies. While Soros was a keyword in our search, Bill Gates and Anthony Fauci were not but they appeared frequently anyway. Bill Gates is usually the alleged perpetrator of Anti vaccination and Depopulation conspiracies. These conspiracy ideas are widely known [37]. While the focus on vaccines and depopulation in tweets mentioning Gates can be explained by the work of the Bill and Melinda Gates foundation, the association with microchips is less obvious. We suspect that among multiple narratives, it had greater fluency [38] due to the strong association between Gates and the word Microsoft.
Fauci is typically mentioned in connection with the Wuhan Institute of Virology, as the alleged sponsor or initiator of the development of SARS-COV-2 (usually referred to as a bioweapon in such tweets), usually acting on behalf of the Deep State.

Donald Trump
As mentioned in the section "Goal narratives", some conspiracy theories suspected that COVID-19 is a plot to ensure that Donald Trump would remain US president after 2020. More extreme statements claim that Donald Trump intentionally let COVID-19 spread, thereby intentionally letting a large number of US citizens die. These however are relatively rare. Conspiracy theories focusing on Donald Trump were the only recurring narrative among the rare pro-Democrat conspiracies.

China
China and the Chinese leadership is frequently mentioned as a perpetrator in Intentional pandemic narratives. Sometimes the claim is that China was working together with Gates, Soros, or the Deep State. Other tweets claim that China was acting alone in order to damage the western world. Furthermore, China is frequently accused of having developed a bioweapon (i.e. COVID-19) in the dataset. However, we did not count such tweets as spreading an Intentional pandemic narrative unless they also claim that the bioweapon was released on purpose.

Powerful organizations
Sometimes groups or organizations that are perceived as influential or powerful appear as perpetrators. These include the Illuminati, Freemasons, the Rockefeller Foundation, the World Economic Forum, and the Rothschild family. They usually appear as perpetrators of Intentional pandemic or New world order conspiracies. However, in this dataset they far less common that the alleged perpetrators mentioned above.

Aliens
Also commonly connected to conspiracy theories [39], a small number of tweets (less than 1% ) makes reference to aliens. However, they do not promote a unified and recognizable narrative.

Counting perpetrator mentions
We count the number of conspiracy tweets mentioning the frequent perpetrators and show the numbers by category in Table 3. The names are based on the caseinsensitive search string. While Trump appears most often, both Donald Trump and China are often mentioned in other contexts than being perpetrators of a conspiracy. Thus, Bill Gates is most often listed as a perpetrator. His sum/total ratio is also the highest, which means that he is frequently associated with multiple conspiracy theories, typically Anti vaccination, Behavior control, Depopulation, or Intentional pandemic.

Conspiracy detection
The presented dataset served as the basis for the MediaEval Challenge 2021.
MediaEval is a benchmark that provides standardized task descriptions, data sets, and evaluation procedures for the multimedia research community. The benchmark aims to make possible systematic comparison of the performance of different approaches to problems of retrieving or analyzing multimedia content. The goal is to identify state-of-the-art algorithms and promote research progress. In the following, we summarize the most important results of the MediaEval FakeNews: Corona Virus and Conspiracies Task 2021 [13]. The task includes three subtasks. The first subtask is text-based fake news detection. Here, participants are asked to build a multi-class classifier that can flag tweets that promote or support the presented conspiracy theories.
The second subtask is the detection of conspiracy theory topics, where the goal is to build a detector that can detect whether a text refers to any of the predefined conspiracy topics.
The third subtask is the combined misinformation and conspiracy detection, where the goal is to build a complex multi-labelling multi-class detector that can predict whether a tweet promotes or supports a particular topic from a list of predefined conspiracy topics.
Despite a large number of promising results [40][41][42] and partly creative approaches [43], the transformer-based approaches [44][45][46], particularly CT-BERT [47], performed the best. In the following, we briefly summarize the results of the winning group [48]. The proceedings of the MediaEval Challenge 2021 including the work of all participants is available at https:// ceur-ws. org/ Vol-3181/.
The authors evaluated three different approaches for each of the subtasks 1, 2, and 3. First, a term frequency-inverse document frequency based approach in which the features were subsequently fed into different supervised learning algorithms. In Task 1, the classifiers were used in a multi-class asset. In the multilabel case of Task 2, the authors used a multi-output classifier.
Second, pre-trained language models that are then fine-tuned on the task of Natural Language Inference were leveraged. Or in other words, given two statements (a premise and a hypothesis), these models are trained to classify the logical relationship between both of them: entailment (agreement or support), contradiction (disagreement), or neutrality (undetermined).
Thirdly, the authors proposed using transformer-based models, specifically RoBERTa and COVID-TwitterBERT to perform classification with a weighted Cross Entropy loss function.
All the models were evaluated on a stratified 5-fold cross-validation set and then evaluated on the test set. Furthermore, Transformer-based approaches delivered the best results. Here, CT-BERT delivered the most competitive results with an Matthews correlation coefficient of 0.720, 0.774, and 0.775 for tasks 1, 2 and 3.

Related work
In the last four years, a significant body of work has proposed methods for automatic fake news detection. The work covers a wide range of approaches, including knowledge graphs and spreading models in addition to natural language processing.
Perez-Rosas et al. [49] present a systematic approach for detecting fake news using natural language processing techniques. A key contribution of their work is the introduction of two novel datasets covering seven different news domains, which allows for a more comprehensive evaluation of their proposed methods. The authors introduce classification models that rely on a combination of lexical, syntactic, and semantic information, as well as features representing text readability properties. Experimental results show that the proposed models were able to achieve satisfactory levels of accuracy in detecting fake news, with the best performing models reaching accuracies that are comparable to human ability to spot fake content.
Le et al. [50], addresses the question of what would happen if adversaries attempted to attack automated detection models for fake news. To this end, they introduce MALCOM, an end-to-end adversarial comment generation framework that allows for attacking such models. Through a comprehensive evaluation, the authors demonstrate that on average, MALCOM can successfully mislead five of the latest neural detection models to always output targeted real and fake news labels approximately 94% and 93.5% of the time, respectively. Limeng Cui et al. [51], proposes a method for detecting misinformation in the healthcare domain. They introduce a knowledge-guided graph attention network called DETERRENT which utilizes domain-specific knowledge and graph structure to improve the performance of misinformation detection for the medical sector.
Beer et al. [52] conduct a systematic literature review to identify the main approaches for identifying fake news, such as different situations these approaches can be applied in, with examples, challenges and appropriate context in which an approach can be used. This work highlights the importance of tackling the problem of fake news as it can have a range of consequences, from being annoying to influencing and misleading societies or even nations.
Giachanou et al. [53], present a method for detecting conspiracy theories in social media using a combination of natural language processing techniques and psycholinguistic characteristics. The author utilized a dataset of tweets related to conspiracy theories and used this data to train a machine learning model that can identify conspiracy propagators based on specific linguistic patterns. The model outperformed other state-of-the-art baselines in terms of performance. The author also highlighted the advantage of using psycho-linguistic characteristics for detecting conspiracy theorists, where it can provide more insights into the nature of conspiracy theories and the personalities of their propagators.
Rangel et al. [54,55] present the results of the 8th International Author Profiling Shared Task at PAN 2020, which focused on identifying Twitter authors who spread fake news in English and Spanish. The participants used different features, including ngrams, stylistics, personality and emotions, and embeddings. They employed machine learning algorithms such as Support Vector Machines and Logistic Regression, and few participants used deep learning techniques like Fully-Connected Neural Networks, CNN, LSTM and Bi-LSTM with self-attention. The results showed that traditional machine learning approaches obtained higher accuracy than deep learning ones. The six top-performing teams used combinations of n-grams with traditional machine learning algorithms, and the best results were obtained in Spanish and English. The paper also highlights that the highest confusion in English is from Real News spreaders to Fake News Spreaders, while in Spanish is the other way around, from Fake News Spreaders to Real News Spreaders. The paper concludes that it is possible to automatically identify potential Fake News Spreaders on Twitter with high precision, but the high rate of false positives highlights the importance of careful error analysis.
These methods generally rely on labeled datasets. Consequently a variety of misinformation datasets have been published in the recent years.
Wang et al. [8] present LIAR: a publicly available dataset for fake news detection collected over the time span of a decade. The dataset includes approx. 12K manually labeled short statements in various contexts from politifact.com, which provides detailed analysis report and links to source documents for each case.
Nabil et al. [56] present a Twitter dataset for Arabic language sentiment analysis, called ASTD. The dataset comprises around 10,000 tweets, categorized into four classes: objective, subjective positive, subjective negative and subjective mixed.
Salem et al. [57] created the first dataset focused on fake news surrounding the conflict in Syria. The authors have also built fully-supervised machine-learning models for detecting fake news and testing it on news articles related to the Syrian war as well as other fake news datasets.
Dai et al. [58] introduce a data set called FakeHealth that aims to facilitate research in the area of health fake news. The data repository contains two featurerich datasets that include a large amount of news content, social engagements, and user-user social networks. The authors conduct exploratory analyses to show the characteristics of the datasets and identify potential patterns and challenges in detecting fake health news.
Shu et al. [59] present a data set called FakeNewsNet that aims to facilitate research in the area of fake news detection. The repository contains a large amount of data collected from news content, social context, and spatiotemporal information. The authors also conduct a preliminary exploration of the various features in Fake-NewsNet and demonstrate its utility by using it in a fake news detection task against several state-of-the-art baselines.
A comprehensive overview over the different datasets was provided in recent work [60]. Furthermore, in a recent survey, fake news spreading was studied together with polarisation dynamics and bots [61].
As COVID-19 misinformation has attracted substantial attention from the research community, several datasets dealing specifically with this topic have been published recently [60,62]. Darius and Urquhart [63] specifically study conspiracy theories related to COVID-19. However, unlike out dataset, they rely on hashtags rather than human annotation.
We also created a Twitter dataset dealing specifically with 5 G-related COVID-19 misinformation, as well as the retweet graphs of such tweets [23,24]. The dataset, wich is called WICO (WIreless COnspircacy) was used in the MediaEval 2020 challenge on fake news detection [64]. It also served as the foundation of an analysis focusing on the 5 G-COVID phenomenon [12]. The MediaEval 2020 fake news detection task closely resembles stance classification [65]. Furthermore, there are many competitions that provide datasets to evaluate language technology, e.g. CLEF [66,67] and SemEval [68,69].
COCO, our new dataset, distinguishes 12 categories of conspiracy narratives rather than focusing on 5 G and COVID-19 alone. Due to the intense coverage of this misinformation category, we excluded 5 G from the search terms in the new dataset. An earlier version containing parts of the new dataset was used in in the MediaEval 2020 challenge on fake news detection [64], where the objective was to train and evaluate machine learning classifiers based on this data. Several participating teams achieved strong results [70]. Thus, our contribution resembles multinarrative datasets such as Emergent [71].

Conclusion
We have presented a new human-labeled misinformation dataset connected to COVID-19 related conspiracy theories. Unlike many previous datasets which only differentiate between true and false information, we label the tweets to distinguish different conspiracy narratives, as well as tweets related to but not promoting such narratives.
This means that conspiracy and non-conspiracy tweets will often use similar words. Thus, obtaining high accuracy when training NLP models to distinguish between both classes becomes harder. They can no longer rely on differences in word frequency, which causes difficulties for methods such as TF-IDF [72]. Instead, they have to analyze the meaning. While BERT-based approaches [44] worked reasonably well in the MediaEval2021 challenge, it was observed that BERT sometimes struggles with negations [73] which are common in the Related category.
In addition, the distinction between the Conspiracy and Related classes allows further analysis of the spread of conspiracy narratives. There is a meaningful difference between categories such as Anti vaccination, which have many tweets in the Related class and Depopulation, which has few, as shown in Table 2. This allows further investigation into the question whether publicly discussing conspiracy theories without promoting their contents nonetheless increases the number of people who believe in them.
The dataset is made publicly available. However, following Twitter's terms of service, the text of the tweets is not contained in the dataset. In future work, we will use the dataset to train advanced machine learning classifiers and apply them to the entire set of tweets. In this manner we will gain a detailed picture about the spread of the different conspiracy narratives during the COVID-19 pandemic.

Appendix A: COVID-19 search keywords
The list of search English keywords for obtaining the initial set of COVID-19 related tweets is as follows: #corona

Appendix B: Misinformation search keywords
The search was performed concurrently with searches in other languages. However, only English tweets were included in the dataset. To select the candidate tweets for annotation, we used the following list of case-insensitive keywords. The last five entries are pairs of words connected by logical AND, i.e., only tweets containing both word in either order were selected. Most keywords were chosen based on previous knowledge. Others were identified in the dataset and subsequently included. Table 4 Co-mentions of keywords in the tweet set after removing links with a total of 514,716 tweets         We removed very rare and combined keywords that had essentially no co-mentions due to limited space. In general, tweets mentioning multiple keywords are uncommon, but some clusters of frequently QANON-related terms, HAARP,geoengineering, and chemtrails, and microchips, implant, and mark of the beast.

Table 5
The main diagonals show the number of times each label was assigned The rest of the table shows how many tweets were assigned the corresponding combination of labels. The upper part refers to the conspiracy label, the lower part to the related label    Only tweets that are labeled as conspiracy for each given category are counted here   Table 7 The number of conspiracy tweets per category over time  Table 8 The number of related tweets per category over time COVID-19 were available, but whose existence or effectiveness has been denied by authorities, either for financial gain by the vaccine producers or some other harmful intent. 2. Behavior control Narratives containing the idea that the pandemic is being exploited to control the behavior of individuals, either directly through fear, through laws which are only accepted because of fear, or through techniques which are impossible with today's technology, such as mind control through microchips. 3. Anti vaccination Narratives that suggest that the COVID-19 vaccines serve some hidden nefarious purpose in this category. Examples include the injection of tracking devices, nanites or an intentional infection with COVID-19, but not concerns about vaccine safety or efficacy, or concerns about the trustworthiness of the producers. 4. Fake virus Narratives saying that there is no COVID-19 pandemic or that the pandemic is just an over-dramatization of the annual flu season. Example intent is to deceive the population in order to hide deaths from other causes, or to control the behavior of the population through irrational fear. 5. Intentional pandemic Narratives claiming that the pandemic is the result of purposeful human action pursuing some illicit goal. Does not include asserting that COVID-19 is a bioweapon or discussing whether it was created in a laboratory since this does not prelude the possibility that it was released accidentally. 6. Harmful radiation Narratives that connect COVID-19 to wireless transmissions, especially from 5 G equipment, claiming for example that 5 G is deadly and that COVID-19 is a coverup, or that 5 G allows mind control via microchips injected in the bloodstream. 7. Depopulation Conspiracy theories on population reduction or population growth control suggest that either COVID-19 or the vaccines are being used to reduce population size, either by killing people or by rendering them infertile. In some cases, this is directed against specific ethnic groups. 8. New world order New World Order (NWO) is a preexisting conspiracy theory which deals with the secret emerging totalitarian world government. In the context of the pandemic, this usually means that COVID-19 is being used to bring about this world government through fear of the virus or by taking away civil liberties, or some other, implausible ideas such as mind control. 9. Esoteric misinformation Truly esoteric ideas concerning spiritual planes etc.
Note that conventional faith-based statements such as "praying for the pandemic to end" do not fall into this category. 10. Satanism Narratives in which the perpetrators are alleged to be some kind of satanists, perform objectionable rituals, or make use of occult ideas or symbols. May involve harm or sexual abuse of children, such as the idea that global elites harvest adrenochrome from children. 11. Other conspiracy theory Catchall category for tweets that interpret other known conspiracy theories in the light of COVID-19 or connect some of the above categories to preexisting conspiracy theories. 12. Other misinformation Catchall category for tweets containing substantial misinformation that does not fulfill the requirements of a conspiracy theory. Only include obvious misinformation.
you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.